Learning Distributed Representations of Phrases
نویسنده
چکیده
Recent work in Natural Language Processing has focused on learning distributed representations of words, phrases, sentences, paragraphs and even whole documents. In such representations, text is represented using multi-dimensional vectors and similarity between pieces of text can be measured using similarity between such vectors. In this project I focus my attention on learning representations of phrases sequences of two or more words that can function as a single unit in a sentence.
منابع مشابه
"The Sum of Its Parts": Joint Learning of Word and Phrase Representations with Autoencoders
Recently, there has been a lot of effort to represent words in continuous vector spaces. Those representations have been shown to capture both semantic and syntactic information about words. However, distributed representations of phrases remain a challenge. We introduce a novel model that jointly learns word vector representations and their summation. Word representations are learnt using the ...
متن کاملDistributed Representations of Words and Phrases and their Compositionality
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup ...
متن کاملMultilingual Distributed Representations without Word Alignment
Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP. By overcoming data sparsity problems, as well as providing information about semantic relatedness which is not available in discrete representations, distributed representations have proven useful in many NLP tasks. Recent work has shown how compositional semantic repres...
متن کاملLearning Continuous Phrase Representations for Translation Modeling
This paper tackles the sparsity problem in estimating phrase translation probabilities by learning continuous phrase representations, whose distributed nature enables the sharing of related phrases in their representations. A pair of source and target phrases are projected into continuous-valued vector representations in a low-dimensional latent space, where their translation score is computed ...
متن کاملMultivariate Gaussian Document Representation from Word Embeddings for Text Categorization
Recently, there has been a lot of activity in learning distributed representations of words in vector spaces. Although there are models capable of learning high-quality distributed representations of words, how to generate vector representations of the same quality for phrases or documents still remains a challenge. In this paper, we propose to model each document as a multivariate Gaussian dis...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014